Corpus Statistics Meet the Noun Compound: Some Empirical Results

نویسنده

  • Mark Lauer
چکیده

A variety of statistical methods for noun compound analysis are implemented and compared. The results support two main conclusions. First, the use of conceptual association not only enables a broad coverage, but also improves the accuracy. Second, an analysis model based on dependency grammar is substantially more accurate than one based on deepest constituents, even though the latter is more prevalent in the literature.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Korean Compound Noun Term Analysis Based on a Chart Parsing Technique

Unlike compound noun terms in English and French, where words are separated by white space, Korean compound noun terms are not separated by white space. In addition, some compound noun terms in the real world result from a spacing error. Thus the analysis of compound noun terms is a difficult task in Korean NLP. Systems based on probabilistic and statistical information extracted from a corpus ...

متن کامل

Collocational Clashes in the Persian Translations of Tuesdays with Morrie

This study aimed at finding features of collocational deviations in the translations of Tuesdays with Mor- rie. In this direction, categories of collocations and collocational clashes, as well as causes of collocation- al clashes were explored. The present work investigated five Persian translations of the novel. All the books were examined completely and all possible collocational clashes were...

متن کامل

Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus

Responding to the need for semantic lexical resources in natural language processing applications, we examine methods to acquire noun compounds (NCs), e.g., orange juice, together with suitable fine-grained semantic interpretations, e.g., squeezed from, which are directly usable as paraphrases. We employ bootstrapping and web statistics, and utilize the relationship between NCs and paraphrasing...

متن کامل

Uncovering Noun-Noun Compound Relations by Gamification

Can relations described by English nounnoun compounds be adequately captured by prepositions? We attempt to answer this question in a data-driven way, using gamification to annotate a set of about a thousand noun-noun compound examples. Annotators could make a choice out of five prepositions generated with the help of paraphrases found in the Google ngram corpus. We show that there is substanti...

متن کامل

Paradigmatic Modifiability Statistics for the Extraction of Complex Multi-Word Terms

We here propose a new method which sets apart domain-specific terminology from common non-specific noun phrases. It is based on the observation that terminological multi-word groups reveal a considerably lesser degree of distributional variation than non-specific noun phrases. We define a measure for the observable amount of paradigmatic modifiability of terms and, subsequently, test it on bigr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995